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SUMMARY 

Using  twelve  years  of  time  series  data  on  highway 
fatalities,   the  methodology  currently  employed  by  the 
National  Highway  Traffic  Safety  Administration  (NHTSA) 
to  forecast  the  annual    (calendar  year)    total  of  highway 
accident  fatalities  were  compared  with  those  obtained 
by  several  computer  routines  based  on  exponential  smoothing 
techniques  and  available  at  the  National  Bureau  of  Standards. 
The  use  of  unadjusted  and  seasonally  adjusted  data  was 
also  examined. 

It  is  found  that  there  is  no  coercive  evidence  to 
lead  to  abandoning  the  present  NHTSA  methods  in  favor 
of  readily  available  computer  routines  based  on  exponential 
smoothing  methods . 

Of  the  methods  examined  in  this  study,   the  best 
results  were  obtained  with  the  EXPSMOOTHING  routine 
using  unadjusted  fatality  data. 


1.  The  Problem:     The  Technical  Analysis  Division  has 
been  requested  to  examine  the  National  Highway  Traffic 
Safety  Administration's   CNHTSA)   current  methodologies  for 
forecasting  the  annual  total  of  highway  accident  fatalities 
given  the  monthly  fatalities  respectively  for  the  first 
three  months  of  the  year,  the  first  six  months  and  finally 
the  first  nine  months  of  the  year,  make  comparisons  of  the 
results  of  their  methodologies  with  those  based  on  readily 
available  computer  programs  at  the  National  Bureau  of 
Standards  for  time  series  analysis  and  forecasting. 

2 .  The  Data  and  Auxiliary  Information  Available: 
Data  for  each  month  are  available  for  the  twelve   (12)  year 
period  1960  through  1971.     Also  available  are  the  outputs 

of  NHTSA's  Computer  Program  TIMSR4 :     Time  Series  Analysis  of 
Fatalities  and  Vehicle-mileage  Data.     This  time  series 
analysis  program  provides  12-month  moving  averages,  monthly 
seasonal  indices,   seasonally  adjusted  data  for  each  month 
and  the  ordered  differences  between  the  seasonally  adjusted 
data  ana  an  uncentered  twelve  month  moving  average.  The 
ordered  differences  are  used  to  determine  for  any  month  of 
a  time  series  upper  and  lower  bounds  for  the  trend  point 
such  that  the  seasonally  adjusted  value  will  fall  within 
those  limits  with  a  known  frequency. 

The  TIMSR4  Program  does  not  have  an  automatic  extra- 
polation  Cforecasting  ahead)    fe^ature.     The  NHTSA  Methods  of 
forecasting  that  we  were  asked  to  review  are  discussed  in 
Section  5. 
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3,  Readily  Available  Computer  Programs:       A  constraint 
imposed  on  the  study  was  to  use  computer  programs  that  were 
readily  available  at  the  National  Bureau  of  Standards.  Thus, 
a  comprehensive  study  of  methods  is  not  given  here. 

Several  time-sharing  interactive  systems  were  immediately 
available.     These  were  DIALCOM,  Computer  Sciences  Corporation 
Conversational  Executive  and  General  Electric  Mark  II.  Of 
these  time-sharing  programs  we  chose  to  use  the  General 
Electric  EXPSMOOTHING  routine,  because  of  the  variety 
of  options  available  and  the  existence  of  an  automatic 
extrapolation  feature. 

The  Large  Scale  Systems     STAT-PACK  for  the  UNIVAC 
1108  lists  several  routines  for  time  series  analysis.  These 
are  UNIVAC  system  subroutines  and  are  not  necessarily 
National  Bureau  of  Standards  routines,  but  they  are  available 
for  use  on  the  NBS  UNIVAC  1108.     The  most  appropriate 
subroutine  for  the  problem  at  hand  appeared  to  be 
GEXSMO  -  Generalized  Exponential  Smoothing. 

Appendix  B  contains  a  discussion  of  the  available 
computer  programs  at  NBS  as  well  as  a  discussion  of  the  back- 
ground and  details  of  the  GEXSMO  computer  program.  Appendices 
A  and  C  provide  the  same  information  for  the  EXPSMOOTHING 
program. 

4.  Plan  of  Analysis: 

a.     Use  the  General  Electric  -  EXPSMOOTHING  routine 
and  the  UNIVAC  STAT-PACK  GEXSMO  routine. 
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b.  Use  both  seasonally  adjusted  data  and  seasonal 
factors   (as  determined  by  the  TIMSR4  program) 
and  unadjusted  data  in  separate  runs. 

c.  Base  forecasts  on  4,   6,   8,  and  11  years  of 
data  plus  data  for  3,   6,  and  9  months  of  new 
data » 

d.  Do  c.   for  each  of  the  NHTSA  forecasting 
algorithms . 

5.     Production  Runs :     We  started  production  runs  using 
the  General  Electric  Time-share  routine  EXPSMOOTHING ,  using 
seasonally  adjusted  data  that  was  determined  by  NHTSA ' s  TIMSR4 
computer  program.     We  always  chose  the  most  complete  output 
option,   so  that  the  routine  always  performed  a  cyclic 
analysis:     if  a  cycle  was  indicated  by  the  routine,  the 
analysis  part  of  the  program  took  this  into  account.      (We  view 
this  as  a  not  too  interesting  bonus  of  the  routine.)  Horizon 
time  was  always  chosen  to  exceed  the  lead  time.*     The  statistical 
analysis  of  the  output  was  accomplished  by  a  mixture  of 
computer  and  hand  calculations.     The  results  of  this  group 
of  calculations  are  contained  in  Table  1. 

The  total  amount  of  data  available  for  forecasting 
covered  11  years,  with  a  minimum  of  3  years  of  data  considered 

*In  the  EXPSMOOTHING  Program,   the  lead  time  is  the 
term  used  to  describe  the  time    (in  periods)  between 
the  last  period  of  data  one  has  as  data  as  input  and 
the  period  one  is  most  interested  in  estimating  correctly. 
The  forecast  v/ill  be  optimized  according  to  the  lead 
time  that  is  entered.     The  forecast  horizon  indicates 
how  many  periods  beyond  the  last  data  point  that  will 
be  forecasted,  but  has  nothing  to  do  with  the  optimization. 
(See  Appendix  A) . 


necessary  for  suitable  application  of  the  exponential 

smoothing  process.     Thus,   annual  "forecasts"  could  be 

made  after  3,    6,   and  9  months  in  each  of  8  years.  Instead 

of  analyzing  this  full  set  of  24*  periods,   half  of  these 

were  chosen,   namely  the  three  periods  in  those  years 

with  4,    6,    8,   and  11  antecedent  years  of  data. 

In  the  headings  of  Table  1,   and  subsequent  tables, 

the  following  symbols  were  adopted: 

S  =  the  actual  va.lue  of  the  time  series  datum 
(in  this  case,  fatalities) 

S  =  an  estimate  of  S 

S  =  seasonally  adjusted  value  of  S  v 
f    <    ^  (below} 
A=S-S;   if   f^  >   01   the  estimate  israbovelthe 

actual  value 

T.  =  summation  over  the  lead  time 

E*  =  summation  over  the  calendar  year   (Jan.  -  Dec.) 

ZA/ZS  =    (signed)    fractional  error  over  lead  time 

E*A/Z*  =   (signed)    fractional  error  over  the  calendar 
year    (Jan.   -  Dec.) 

MAD  =  mean  absolute  deviation 

a  =  smoothing  constant 

k  =  length  of  cycle;   k  =  1  indicated  data  is  not  cyclic. 

One  of  the  interesting  features  of  Table  1  is 
that  there  is  no  apparent  pattern  for  the  cycles  with  the 
age  of  the  data.     The  same  holds  true  for  the  a  value    (smoo thine 
constant)   and  the  order  of  smoothing.     This  led  us  to  dupli- 


*An  additional  3  periods  were  analyzed  for  non- 
calendar  twelve-month  years,   raising  the  potential  sample 
size  to  27  and  the  number  tested  to  15. 
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cate  these  computations  by  suppressing  the  cyclic  analysis. 
The  summary  of  the  results  of  these  computer  runs  is  given 
in  Table  2.     here  there  does  appear  to  be  some  reasonableness 
in  the  variation  of  the  smoothing  constant  with  data  age. 
The  comparison  of  the  forecasting  results   (Table  3)  shows 
a  slight  advantage  to  the  case  of  using  the  cyclic  analysis, 
but  the  results  are  not  heavily  in  favor  of  the  latter. 

In  the  case  of  4  years  and  3  months  of  seasonally 
adjusted  data,   the  computer  routine  selected  a  =  0.175 
(order  2)   for  the  optimum  smoothing  constant.     We  were 
interested  in  determing  whether  or  not  the  minimum  was  sharp 
or  flat,   so  we  made  runs  for  a  =0.170  and  0.180.     The  order 
of  smoothing  remained  at  2  and  the  minimum  was  found  to  be 
flat;   only  a  very  slight  change  in  forecast  error  fraction 
could  be  gained  by  changing  a  from  .175  to  .170    (see  Table 
4).     The  change  is  from  0.0121  to  0.0109. 

Table  5  presents  the  results  of  duplicate  runs  using 
unadjusted  data  in  the  EXPSMOOTHING  routine.     Cyclic  analysis 
was  permitted  and  each  time  the  routine  selected  K=12  for 
the  dominant  cycle. 

We  also  mdde  matching  runs  starting  with  unadjusted  data 
on  the  UNIVAC  1108  program  GEXSMO,   starting  with  a  given 
cycle    (seasonality)    length  of  12  and  using  2  initial  periods 
to  determine  the  initial  seasonality  factors  and  other 
starting  values.     The  GEXSMO  routine  alv/ays  produces  a  fore- 
cast for  the  time  indicated  as  the  length  of  the  cycle 
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starting  from  the  last  input  datum.     In  the  language  of  the 
EXPSMOOTEING  routine,   the  lead  time  is  1  and  the  horizon 
time  is  the  length  of  the  cycle.     The  summary  statistics 
for  these  runs  are  presented  in  Table  6. 

It  is  interesting  to  note  that  in  working  with  un- 
adjusted data  and  these  two  computer  routines,   it  is  possible 
to  compare  the  seasonally  adjusted  data  and  seasonality 
factors  produced  by  these  routines  with  those  determined 
by  the  TIMSR4  program.     We  have  not  made  these  comparisons, 
preferring  at  this  stage  to  look  at  the  end  results. 

Table  7  presents  the  detailed  calculations  for  the 
current  methodologies  employed  by  NHTSA  to  forecast  the 
calendar  year  fatalities.     These  methods  were  developed 
in  accordance  with  descriptions  furnished  by  Mr.  Donald  F. 
Mela  in  his  letter  of  November  30,   1972,  and  have  been 
labelled  1,   2  and  3  in  the  same  order  as  described  by  Mr. 
Mela.     Higher  order  methods  are  possible  by  taking  into 
account  more  years  of  input  data  in  the  averaging  process, 
but  we  have  limited  ourselves  to  the  first  three  methods. 

The  notations  used  in  the  column  headings  of  Table  7 

are : 

m    =  current  total  for  specified  months  of  fatalities 
o  .  ^ 

m  ^  =  the  corresponding  total  for  specified  months  of 
fatalities  for  i^h  previous  year. 
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=  the  total  fatalities  for  the  i^h  previous  year 
A=  leap  year  adjustment  factor 

Yj  =  the  forecast  calendar  year  fatalities  by  method  j 
With  this  notation  the  NIITSA  forecast  methods  may  be  written; 
Method  1:  =    (m^/m.i)  y  ^  . 

Method  2:     Y2  =\Y-^ 

Method  3:     Y3  =  X  {,m^/l  ^  Y_ . 

k  k 

Method  k:     Y,    =  A  (mV^  m   .)    Z  Y     ,  k  >  1 

^            °  1  -i     1  "i 

Table  8  presents  a  composite  summary  of  the  fractional 
errors  by  each  of  the  methods  considered.     In  this  table 
and  Table  9,  SAC  =  Seasonally  adjusted  data  with  cyclic 
analysis. 

Considering  just  the  NHTSA's  methods,  we  find  a  gradual 
improvement  in  the  forecast  with  increasing  order  of  the 
method.     nowever,   in  certain  specific  cases  Method  2  gives 
slightly  better  results  with  methods  of  order  greater  than 
2.     Generally,   the  error  decreases  with  the  shorter  lead 
time . 

Comparing  the  four  exponential  smoothing  methods ,  the 
sample  evidence  seems  to  favor  EXPSMOOTHING  using  unadjusted 
data.     However,  each  of  the  four  methods  has  a  mean  absolute 
deviation  of  approximately  1  percent,  which  is  virtually 
indistinguishable  from  that  of  Method  3. 
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Comparing  the  "best"  NHTSA  method   (Method  3)  with 
the  "best"  exponential  smoothing  routine  we  find  for 
the  sample  of  12  trials  out  of  the  possible  24  forecasts 
that  method  3  was  better  5  times  and  the  best  exponential 
smoothing  method  was  better  on  7  occasions.     This  is 
obtained  by  comparing  the  entries  in  columns  headed 
E*a/Z*S  and  Unadj  C  in  Table  8.     However,   the  range  of  the 
absolute  deviations  of  the  exponential  methods  is  smaller 
than  that  of  any  of  the  methods  1,   2  or  3.     Thus,  it 
would  appear  that  there  is  a  small  gain  in  going  to 
the  exponential  smoothing  methods. 

Our  experience  in  using  the  EXPSMOOTHING  routine  has 
made  us  wary  of  using  long  lead  times,   in  particular  L  =  12. 
In  some  cases  we  have  found  that  better  results  can  be 
obtained  in  making  forecasts  for  a  year  ahead  by  using 
L  =  3  in  place  of  12.     See  Table  9  for  the  appropriate 
comparisons.     However,  no  clear  cut  rule  can  be  recommended. 
6 .  Conclusions 

For  the  forecasting  problem  at  hand   (see  section  1) , 
if  one  can  live  with  the  percentage  range  of  errors 
indicated  with  the  present  NHTSA 's  methods,   there  is  no 
coercive  evidence  to  lead  to  abandoning  the  present 
methods  in  favor  of  readily  available  computer  routines 
for  exponential  smoothing  methods. 

Of  the  methods  examined  in  this  study,  the  best  results 
were  obtained  with  the  EXPSMOOTHING  routine  using  unadjusted 
fatality  data. 
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APPENDIX  A 

Exponential  Methods  for  Analysis  and  Forecasting 


of  Economic  Time  Series 


From  approximately  1960,  most  methods  of  time 

series  smoothing  analysis  have  gone  toward  the  use  of 

varying  weights  based  on  the  age  of  the  data.     The  simple 

moving  average  ignores  data  beyond  a  certain  age  and 

weights  equally  each  datum  that  is  used.     The  most  popular 

weighting  schemes  in  vogue  today  are  based  on  some  form 

of  "exponential"  smoothing.     These  schemes  use  all  of  the 

data,  with  the  weight  given  an  individual  datum  decreasing 

with  increasing  age  of  the  datum.     The  rate  of  decreasing 

weight  is  determined  by  a  parameter  called  a  smoothing 

constant  and  most  frequently  denoted  by  the  Greek  letter 

alpha   (a).     Depending  on  the  nature  of  the  time  series, 

a  variety  of  models  may  be  used  to  represent  the  time 

series,  for  example,  a  constant  process  model        =  a^, 

a  linear  process  model        =  aQ  +  a-^t,  a  quadratic  process 

model  C  (t)   =  a^  +  a^t  +  ^2^^'  each  case  there 

^n  A 

is  a  process  C  (t)  =  l^_q  a.j_t  /±l  which  is  observed  in 
the  presence  of  noise  x(t)   =  ? (t)   +  e (t) .     The  number 
of  "degrees  of  freedom"  of  these  models  are  respectively 
1,   2,   3,  etc.,   and  this  number  is  sometimes  referred 
to  as  the  order  of  the  model . 

Multiple  Smoothing;     If  a  constant  model   (C^  =  a)  is 
used  to  represent  the  time  series  x(t)   =  ? (t)   +  t (t) , 
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exponential  smoothing  for  estimating  the  single  coefficient 

in  a  constant  model  is 

a.    =  S.  (x)   -  ax.    +    (1-a)  .  (x) 

t         t  t  t-1 

For  estimating  the  two  coefficients!  in  a  linear  model, 
the  notion  of  double  smoothing  is  used.     For  generality 
the  exponential  smoothing  for  the  constant  model  is 
called  single  smoothing  and  is  represented  by 

[1]  [1] 
S  ^   (x) ,   i.e.,   S  ^   (x)   =  S^(x).     Then  double  smoothing 

is  defined  as  S^^^(x)   =     S^i^(x)   +   (1-a)   S^^^    (x)  , 

t  t  t-1 

Similarly,  multiple  smoothing  of  order  k  is  defined  by 

S^^^  (X)    =  (X)    +    (1-a)    S (x), 

t  t  t-1 

th 

I.e.,   k —  -  order  smoothing  is  just  simple  exponential 
smoothing  applied  to  the  results  of    (k-l)st  -  order 
smoothing  of  the  data. 

As  an  example,   in  the  case  of  a  linear  model: 
=  a^  +  a^t 

the  forecast  of  the  value  of  the  time  series  t  units 
after  time  T  is  given  by 

x^  (T)   =  a^  (T)   +  xa-^  (T) 

A.  /N 

where       (T)   and  a^(T)   are  given  by 

a    (T)   =  2  S    (x)   -  S  (x) 
o  T  T 


ai(T)    -  a  [s*  (X)    -  S^f^^  (X) 

1-a  -"^ 
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A  convenient,  comprehensive  basic  reference  is 
R.G.  Brown,  Smoothing/  Forecasting/  and  Prediction  of 
Discrete  Time  Series,  Prentice-Hall,   196  3.     A  summary 
of  the  fundamental  formulas  of  multiple  smoothing  and 
forecasting  are  given  on  pages  142-144,  and  pages  184ff 
of  this  reference. 

We  remark  at  this  time  that  in  using  multiple 
smoothing  methods  two  problems  arise:      (a)  what  order 
of  smoothing  should  be  used?     (b)  what  value  of  the 
smoothing  constant  should  be  employed?     A  third,  but 
minor,  problem  is  the  determination  of  starting  values. 

The  mathematical  basis  for  the  EXPSMOOTHING  computer 
program  is  founded  on  the  concepts  of  multiple  smoothing, 
and  utilizes  the  formulas  presented  in  Brown's  book. 

A  similar  but  different  approach  forms  the  basis 
of  the  UNIVAC  GEXSMO  program.     This  generalized  exponential 
smoothing  program  is  based  on  the  work  of  P.P..  Winters, 
"Forecasting  Sales  by  Exponentially  Weighted  Moving 
Averages,"  Management  Science,  April,   1960.     The  essentials 
of  this  computer  program  are  given  in  Appendix  B  under 
the  section  UNIVAC  1108. 
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APPENDIX  B 

Available  Computer  Programs  at  NBS  for 
Time  Series  Analysis 

Several  time-sharing  interactive  systems  were 
immediately  available  at  NBS.     These  were  DIALCOM, 
Computer  Sciences  Corporation  Conversational  Executive 
and  General  Electric  Mark  II.       In  addition  the  Large 
Scale  Systems  STAT-PACK  for  the  UNIVAC  1108  tests 
several  routines  for  time  series  analyses,     A  summary 
of  the  pertinent  program  libraries  of  each  of  these 
systems  follows. 

DIALCOM;     Moving  Average   (A  Simple  Moving  average) 
Autocovariance 
Cross  Covariance 

Smoothed  Series    (A  weighted  moving  average) 
Seasonal  Index  and  Cyclical  Movement 
Note:     No  expontential  smoothing  method  is  available  and 
none  of  the  programs  have  an  extrapolation  feature. 

Computer  Sciences  Conversational  Executive  (CSCX 
Basic  Library) ;     This  has  a  Triple  Exponential  Smoothing 
routine  called  ***SMOOTH.     In  addition  to  smoothing,  and 
listing  the  smoothed  values  and  differences,   the  routine 
can  produce  a  plot  of  the  observed  values  and  the  smoothed 
values    (on  the  same  graph)   and  an  extrapolation  feature 
is  also  available. 
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There  is  no  routine  available  for  first  or  second 
order  smoothing  and  there  is  no  automatic  feature  such 
as  a  mean  absolute  deviation.     The  value  of  the  smoothing 
constant  must  be  supplied  by  the  user. 

General  Electric    (Mark  II) ;     The  EXPSMOOTHING 
routine  is  a  very  comprehensive  routine  which  performs 
a  sequence  of  exponential  smoothing  operations  for  a 
number  of  alpha  values  for  each  order  of  smoothing:  1, 
2 1   3.     The  routine  then  selects  that  combination  of 
alpha  and  order  of  smoothing  that  has  minimum  mean 
absolute  deviation  per  data  point.     A  forecast  (extrapolation) 
option  is  also  available,  along  with  a  plot.     A  variety 
of  output  options  is  possible. 

The  routine  also  permits  the  users  to  override 
several  automatic  features  of  the  routine.     For  example, 
the  user  has  the  capability  to  specify  cycle  length 
of  known  periodic  data  and  also  to  specify  the  values 
of  the  smoothing  constants,  but  not  the  order  of  smoothing. 

A  summary  of  the  essential  details  of  this  particular 
routine  is  given  in  Appendix  C. 

UNIVAC  1108:     The  Large  Scale  Systems  STAT-PACK 
for  the  UNIVAC  1108  lists  several  routines  for  time 
series  analysis.     These  are  UNIVAC  system  subroutines 
and  are  not  necessarily  National  Bureau  of  Standards 
routines,   but  they  are  available  for  use  on  tbo  NBS 
UNIVAC  1108.     The  most  appropriate  subroutine  for  the 
problem  at  hand  appears  to  be  GEXSMO  -  Generalized 
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Exponential  Smoothing.*     The  routine 


1.  Produces  the  seasonally  adjusted  series  point 

2.  Computes  a  seasonality  factor        to  be  used 
in  the  next  cycle 

3.  Computes  the  next  trend  value 

4.  Produces  a  forecast  for  the  next  period,** S 

5.  Compares  the  forecast  with  the  actual  to  get 
the  forecast  error 

^t,l  "  ^t+1  "  ^t,l 

where  S^_^^  is  the    (t+l)st  time  series  data  point. 
When  all  the  data  are  used,  L  forecasts  for  future 
periods  may  be  produced  by  using  the  equation 

K  =    ^^N  +  ^N-L+K     ^  =  1'    2,  L    (Linear  _ 

r  increasing) 

where  L  is  the  periodicity  of  the  data,  e.g.,  if  the  data 

is  by  month,  L  =  12  and  N  is  the  number  of  elements  in 

the  time  series. 

Three  exponential  smoothing  factors,  A,  B,  and  C 

all  between  0  and  1  are  used  respectively  to  produce 

the  seasonally  adjusted  series,   to  adjust  the  seasonality 

factors  and  to  adjust  the  trend.     These  smoothing  factors 

and  initial  values  may  be  input  or  may  be  calculated  by 

the  routine. 


*The  notation  used  here  is  that  employe.^  in  the  UNIVAC 
Programmers  Reference  Manual.     The  mathematical  background  for 
the  method  as  given  in  Winters,  P.R.:     Forecasting  Sales  by 
Exponentially  Weighted  Moving  Averages,   Management  Science, 
April,  1960. 

**In  this  context  "next  period"  refers  to  the  next  time  point. 


The  output  of  the  GEXSMO  routine  is  listed  under 
the  headings: 

ORIGINAL     SEASONALLY     FORECASTED     SEASONALITY     TREND  ERROR 
SERIES  ADJ.    SERIES  SERIES 

and  this  is  followed  by 

MEAN  ABSOLUTE  DEVIATION  OF  FORECAST  ERRORS   

PERCENTAGE  ERROR  

We  mention  the  availability  of  other  UNIVAC  time 
series  routines: 

1.  MOVAVG  -  Moving  Averages.     This  subroutine 
computes  the  smoothing  coefficients  in  a  polynomial 
model  for  a  time  series  and  smooths  the  given 

time  series . 

2.  SEASHI  -  Shiskin's  Seasonality  Factors.  This 
subroutine  produces  seasonality  factors  from 
a  time  series,  and  smooths,  detrends  and 
deseasonalizes  the  series. 

3.  WEMAV  -  Weighted  Moving  Averages,     This  subroutine 
eliminates  a  trend  from  a  time  series  by  a 
weighted  moving  average. 

4.  TRELS  -  Trend  Analysis  by  Least  Squares.  This 
subroutine  removes  trends  in  a  time  series  with 
a  general  linear  model,   using  least  squares. 
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5.  VADIME  -  Variate  Difference  Method.     This  sub- 
routine estimates  the  variance  of  the  random 
component  in  a  time  series  and  determines  a  lower 
limit  for  the  degree  of  the  polynomial  which 

can  be  used  in  approximating  the  trend. 

6.  TSFARG  -  Autoregressive  Model.     This  subroutine 
obtains  the  least  square  coefficients  in  an 
autoregressive  time  series  model  and  produces 
forward  forecasts  from  the  model. 

The  documentation  of  the  detail  of  each  of  these 
seven  programs  is  very  adequately  done  in  the  UNIVAC 
Programmer's  Inference  Manual  for  STAT-PACK  and  is  not 
repeated  here. 
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APPENDIX  C 
Details  of  EXPSMOOTHING  Routine 

The  EXPSMOOTHING  routine  is  a  very  comprehensive 
routine  which  performs  a  sequence  of  exponential  smoothing 
operations  for  a  number  of  alpha  values  for  each  order 
of  smoothing:     1,   2,   3.     The  routine  then  selects  that 
combination  of  alpha  and  order  of  smoothing  that  has 
minimum  mean  absolute  deviation  per  data  point.  A 
forecast   (extrapolation)   option  is  also  available,  along 
with  a  plot.     A  variety  of  output  options  is  possible. 

The  routine  also  permits  the  users  to  override 
several  automatic  features  of  the  routine.     For  example, 
the  user  has  the  capability  to  specify  cycle  length  of 
known  periodic  data  and  also  to  specify  the  values  of 
the  smoothing  constants,  but  not  the  order  of  smoothing. 

For  the  convenience  of  the  reader  we  present  a 
summary  of  the  essentials  of  the  General  Electric  Time 
Sharing  EXPSMOOTHING  Routine.     The  discussion  is  based 
on  experience  in  using  the  routine  and  explanations  of 
this  routine  from  two  User's  Guides*  prepared  by  the 


*Mark  II  User's  Guide:     Statistical  Analysis  System, 
General  Electric  Company,   December,   1971  (5707.01). 

GETSA$ :     Mark  I  Marketing  and  Economic  Forecasting, 
A  Hands-On  Users  Guide,   General  Electric  Company,  1969. 
(906329) 
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the  General  Electric  Company,  and  is  prepared  in  the 
sequence  provided  by  the  version  in  use  in  December,  1972. 
For  additional  details  on  program  operation  the  reader 
is  referred  to  the  user's  guides.     The  time  series  data 
to  be  subject  to  the  EXPSMOOTHING  routine  is  called  RAW 
DATA  by  the  program.     The  RAW  DATA  may  be  adjusted 
initially  by  what  is  called  a  BASE  SERIES.     This  series 
permits  one  to  remove  known  distortions  that  one  would 
not  expect  to  occur  at  future  times.     The  base  series 
may  be  used  to  take  into  account  discontinuities,  human 
judgment  or  the  results  of  statistical  analyses  of  the 
data.     The  base  series  values  are  subtracted  from  the 
raw  data  before  an  analysis  is  made  and  later  added 
back  into  the  forecast. 

To  operate  the  program,  in  addition  to  entering 
the  raw  data  and  the  base  series,  one  has  to  indicate 
two  times,    the  LEAD  TIME    (L)    and  the  FORECAST  HORIZON 
(H)   as  well  as  a  choice  of  six  output  options. 

Lead  time  is  the  term  used  to  describe  the  time 
(in  periods)   between  the  last  piece  of  data  one  has  as 
data  as  input  and  the  period  one  is  most  interested  in 
estimating  correctly.     The  forecast  will  be  optimized 
according  to  the  lead  time  that  is  entered. 

The  forecast  horizon  indicates  how  many  periods 
beyond  the  last  data  point  will  be  forecasted,  but  has 
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nothing  to  do  with  the  optimization.     For  example,  if 
the  raw  data  is  reported  monthly  and  the  user  wishes 
to  forecast  six  months  in  the  future,  accurately,  and 
also  wishes  to  take  a  casual  glance  at  the  forecast  one 
year  ahead,  a  lead  time  of  six  and  a  horizon  time  of  12 
would  be  indicated. 

The  output  options  are  cumulative,   so  that  option 
code  5  gives  all  options.     The  codes  for  the  output 
options  are: 

0  =  Simple  forecast  table 

1  =  Raw  data  residue 

2  =  Raw  data  list 

3  =  Base  series  list  and  cyclic  analysis 

4  =  Cyclic  forecast 

5  =  Trend  and  error  analysis  and  composite  forecast. 
A  graphical  display  of  the  raw  data  and  forecasted 

(smoothed)   and  extrapolated  data  may  also  be  obtained. 

The  following  discussion  is  based  on  maximum  output 
(Code  5).     The  output  indicates  whether  the  smoothing 
constant  was  supplied  or  provided  automatically  by  the 
programs  and 

a.  NUMBER  OF  RAW  DATA  POINTS 

b.  NUMBER  OF  BASE  DATA  POINTS 

c.  FORECAST  HORIZON 

d.  LEAD  TIME. 

Next,   the  RAW  DATA  are  printed  out. 
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Then  comes  DETERMINATION  OF  THE  OPTIMUM  NUMBER  OF 


DATA  POINTS  PER  CYCLE.     The  entries  are  in  columns  K 

and  ERR(K),  where  K  refers  to  cycle  length  as  expressed 

in  data  periods,  and  where  K  runs  from  1  to  half  the 

number  of  data  points.     The  ERR(K)   shows  a  measure* 

of  error  associated  with  each  cycle  length.     The  GETSA$ 

user's  guide  states: 

The  lower  the  error,  the  stronger  is  the 
tendency  for  the  data  to  conform  to  a  cyclic 
pattern.     The  absolute  value  of  the  error 
term  has  little  significance  for  interpretation, 
but  the  relative  value  shows  which  cycle 
length  fits  best  and  that  length   (K  value) 
is  chosen  as  the  optimum  length  and  subsequent 
calculations  assume  a  cycle  of  that  length.... 
The  optimum  number  of  periods  per  cycle  will 
be  chosen  based  on  minimum  error.     That  cycle 
length  is  assumed  for  the  remainder  of  the 
analyses . 

If  min  ERR(K)  occurs  for  K  =  1,  the  routine  will  indicate 
that  the  data  is  not  cyclic. 

Then  the  output  indicates  THE  OPTIMUM  PERIOD  FOR 
THE  CYCLE  and  prints  out  the  corrections  that  are  used 
to  remove  the  cyclic  effects,  under  the  headings  T  and 
C  (T) **. 


*We  have  been  unable  to  determine  the  exact  nature 
of  this  measure. 

**We  have  been  unable  to  determine  the  exact  nature 
of  this  cdrrection. 
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Next  there  appears  TABULATION  OF  THE  ACTUAL 
(UNSMOOTHED)   RESIDUE.     The  residue  at  this  stage  contains 
the  corrections  for  base  series  and  cyclic  effects.  At 
this  point  the 

Residue   (T)  = 

T 

Raw  Data  Residue   (T)   -  I  C{t-1), 

where  C(0)   =  0. 

The  output  then  indicates  the  results  of  EXPONENTIAL 

SMOOTHING  APPLIED  TO  RESIDUE:     under  the  headings 

ALPHA  TYPE  OF  SMOOTHING  MEAN  ABSOLUTE 

DEVIATION  PER  DATA 
POINT 

If  the  alpha  values  are  not  supplied  but  are  determined 

by  the  program,  the  program  will  select*  eight  (8) 

values  of  alpha  and  do  exponential  smoothings  of  orders 

1,   2f   and  3  for  each  alpha.     The  minimum  absolute  deviation 

per  data  point  for  this  set  of  24  smoothings  then 

determines  the  alpha  value  and  order  of  smoothing  that 

will  be  employed  in  the  so-called  FORECAST  section  of  the 

program.     The  output  headings  at  this  stage  are: 

OPTIMUM  TYPE  OF  MEAN  ABSOLUTE  DEVIATION 

ALPHA  SMOOTHING  PER  DATA  POINT 

followed  by  FORECAST  

TIME         RESIDUE         COMPOSITE         ACTUAL  ERROR 


*If  a  is  not  forced,  program  will  select  8  values  of 

a  based  on  a  =  1^    2  .     The  leading  term  is  always 

8  L+F~ 

a-^  =   .01.     The  second  term  is  a2  =  min    (.Ob,  a). 

If  a2  =   .05,    a-^  =  OL  r  ~  2°^'    •••f    o'g  ~  6a, 

If  a2  =  a,    a3  =  2a,    a4  =  3^,  ag  =  7a. 
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Under  TIME  the  initial  entry  is  at  time  5  +  L,  where  L  is 
the  lead  time.     Several  observations  are  required  before 
a  smoothed  average  can  be  calculated  and  this  routine 
uses  five. 

The  RESIDUE  indicated  is  the  smoothed  residue. 

Under  COMPOSITE  appear  the  results  of  putting  back 
in  the  reverse  corrections  for  cyclic  effects  and  base 
series. 

Under  ACTUAL  are  listed  the  values  that  were  originally 
listed  as  RAW  DATA. 

Under  ERROR  appear  the  results  of  ACTUAL  minus 

COMPOSITE.     The  entries  under  RESIDUE  and  COMPOSITE 

beyond  the  last  ACTUAL  data  point  are  based  on  the 

exponential  smoothing  forecast  formulas*  of  orders 

1,   2,  3: 

Fl^       =  SI.    =  CEDl^ 
t+L  t  t 


F2 


t+L  =  CED2t  +  L(C2^) 


F3^,^   =  CED3.    +  L(C3^)    +  1/2  l2(RC3.) 

The  values  that  are  actually  used  in  the  extrapolation 
are  listed  under  the  headings: 

SI  S2  S3 

CEDl  CED2  CED3 

C2  C3  RC3 

The  values  shown  are  for  the  last  raw  data  point. 


*These  and  other  formulas  used  in  the  exponential  smoothing, 
process  can  readily  be  transformed  into  the  notation  used 
in  Brown's  book  in  pages  142-144  and  pages  184ff. 
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This  is  followed  by  HORIZON  FORECAST  BEGINS  AT  TIME   

and  LEAST  SQUARES  FIT 
and  MEAN  and  VARIANCE. 

The  LEAST  SQUARES  FIT  is  simply  an  equation  describing 
the  straight  line  trend  which  best  approximates    (in  the 
sense  of  least  squares)  movement  of  the  raw  data  for 
all  the  data. 

At  this  time  the  computer  will  ask  WANT  A  PLOT? 
and  WANT  DATA  STORED?     If  a  plot  is  desired,   this  routine 
will  give  on  one  graph  plots  of  the  values  listed  under 
FORECAST  COMPOSITE  and  ACTUAL. 
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